For DL research, it is recommanded to use docker to manage Cuda and Cudnn version, since the the combination of Cuda and Cudann is very strict, if you install the wrong Cuda and Cudnn combination, you will get tons of error when running tensorflow or pytorch.
nvidia gpu cloud is a docker platform maintained by Nvidia, in this platform, we can get doker for tensorflow or pytorch with specific Cuda and Cudnn installed, which help engineers or researchers to make DL developping more efficient.
To use run Nvidia docker, we need to install docker and Nvidia docker v2 first, in this article, I will note down my practice.
PS: Nvidia docker V2 is a docker software, Nvidia dockers in nvidia gpu cloud are docker images, they are different, in this article, I will talk about installation of the software
Install nvidia driver
Before installing Nvidia docker V2, we need to install graphic driver to make the graphic like: GTX 1080Ti available
- Command for install:
1 | Install driver |
Install docker ce 18.09.1
Need to install docker ce berfore nvidia docker v2
- Command for install:
1 | Uninstall old versions |
- Check if was installed successfully
1 | Check docker version |
- Tips:
when running docker , system may get permission error, you can use ‘sudo’ to get rid of it , if you want to solve the problem with a better way ,you can fix the docker permission:
1
2
3
4
5Add the current user to the docker group
sudo usermod -a -G docker $USER
After setting, remember to log out then log in
exit
Install nvidia docker v2
Nvida docker v2 can make the Cuda for Nvidia docker available.
- Command for install:
1 | If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers |
- Check if was installed successfully
1 | Test nvidia-smi with the latest official CUDA image |
- Tips:
Sometime, when testing the nvida docker 2 with nvidia-smi function, may come across error, try to uninstall nvidia driver , reinstall nvidia driver and reboot